Mamba MOE Quant Configs + Fix Export Bug#882
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds Mamba MOE-specific quantization configuration variants for FP8 and NVFP4 quantizers, and fixes export module exclusion pattern handling by stripping trailing dots from prefix names. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #882 +/- ##
=======================================
Coverage 73.73% 73.74%
=======================================
Files 199 199
Lines 21165 21170 +5
=======================================
+ Hits 15606 15611 +5
Misses 5559 5559 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ChenhanYu
left a comment
There was a problem hiding this comment.
The formatting test is failing.
700c32d to
5b983e3
Compare
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
801881e to
c9fb020
Compare
## What does this PR do? **Type of change:** ? Bug fix **Overview:** ? - Fix a bug in MCore export `exclude_modules` where the layers had an extra period at the end - Add custom quant configs for mamba moes ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added four new Mamba MOE quantization configurations: aggressive and conservative variants for both FP8 and NVFP4 quantization schemes, providing enhanced flexibility in quantization options for different use cases. * **Bug Fixes** * Improved quantization export module exclusion pattern handling to properly normalize trailing dots from exclude patterns during export. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jennifer Chen <jennifchen@nvidia.com>
## What does this PR do? **Type of change:** ? Bug fix **Overview:** ? - Fix a bug in MCore export `exclude_modules` where the layers had an extra period at the end - Add custom quant configs for mamba moes ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added four new Mamba MOE quantization configurations: aggressive and conservative variants for both FP8 and NVFP4 quantization schemes, providing enhanced flexibility in quantization options for different use cases. * **Bug Fixes** * Improved quantization export module exclusion pattern handling to properly normalize trailing dots from exclude patterns during export. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jennifer Chen <jennifchen@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
What does this PR do?
Type of change: ? Bug fix
Overview: ?
exclude_moduleswhere the layers had an extra period at the endUsage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
New Features
Bug Fixes